You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for large file uploads in multi-part mode (#106)
- Implement LFS support for large files
- Add command to enable LFS in repos
- Create multipart upload functionality
- Update CLI documentation for new commands
- Revise README for command usage examples
-`csghub-cli upload` will create repo and its branch if they do not exist. The default branch is `main`. If you want to upload to a specific branch, you can use the `--revision` option. If the branch does not exist, it will be created. If the branch already exists, the files will be uploaded to that branch.
145
-
-`csghub-cli upload` has a limitation of the file size to 4GB. If you need to upload larger files, you can use the `csghub-cli upload-large-folder` command.
146
-
147
-
When using the `upload-large-folder` command to upload a folder, the upload progress will be recorded in the `.cache` folder within the upload directory to support resumable uploads. Do not delete the `.cache` folder before the upload is complete.
148
-
149
-
Download location is `~/.cache/csg/` by default.
88
+
For detailed command line usage examples, including downloading models/datasets, uploading files/folders, and managing inference/fine-tuning instances, please refer to our [CLI documentation](doc/cli.md).
150
89
151
90
## Use cases of SDK
152
91
153
-
For more detailed instructions, including API documentation and usage examples, please refer to the Use case.
154
-
155
-
### Download model
156
-
157
-
```python
158
-
from pycsghub.snapshot_download import snapshot_download
159
-
token ="your_access_token"
160
-
161
-
endpoint ="https://hub.opencsg.com"
162
-
repo_id ='OpenCSG/csg-wukong-1B'
163
-
cache_dir ='/Users/hhwang/temp/'
164
-
result = snapshot_download(repo_id, cache_dir=cache_dir, endpoint=endpoint, token=token)
165
-
```
166
-
167
-
### Download model with allow patterns '*.json' and ignore '*_config.json' pattern of files
168
-
169
-
```python
170
-
from pycsghub.snapshot_download import snapshot_download
171
-
token ="your_access_token"
172
-
173
-
endpoint ="https://hub.opencsg.com"
174
-
repo_id ='OpenCSG/csg-wukong-1B'
175
-
cache_dir ='/Users/hhwang/temp/'
176
-
allow_patterns = ["*.json"]
177
-
ignore_patterns = ["*_config.json"]
178
-
result = snapshot_download(repo_id, cache_dir=cache_dir, endpoint=endpoint, token=token, allow_patterns=allow_patterns, ignore_patterns=ignore_patterns)
179
-
```
180
-
181
-
### Download dataset
182
-
```python
183
-
from pycsghub.snapshot_download import snapshot_download
184
-
token="xxxx"
185
-
endpoint ="https://hub.opencsg.com"
186
-
repo_id ='AIWizards/tmmluplus'
187
-
repo_type="dataset"
188
-
cache_dir ='/Users/xiangzhen/Downloads/'
189
-
result = snapshot_download(repo_id, repo_type=repo_type, cache_dir=cache_dir, endpoint=endpoint, token=token)
Before starting, please make sure you have Git-LFS installed (see [here](https://git-lfs.github.com/) for installation instructions).
252
-
253
-
```python
254
-
from pycsghub.repository import Repository
255
-
256
-
token ="your access token"
257
-
258
-
r = Repository(
259
-
repo_id="wanghh2003/ds15",
260
-
upload_path="/Users/hhwang/temp/bbb/jsonl",
261
-
user_name="wanghh2003",
262
-
token=token,
263
-
repo_type="dataset",
264
-
)
265
-
266
-
r.upload()
267
-
```
268
-
269
-
### Upload the local path to the specified path in the repo
270
-
271
-
Before starting, please make sure you have Git-LFS installed (see [here](https://git-lfs.github.com/) for installation instructions).
272
-
273
-
```python
274
-
from pycsghub.repository import Repository
275
-
276
-
token ="your access token"
277
-
278
-
r = Repository(
279
-
repo_id="wanghh2000/model01",
280
-
upload_path="/Users/hhwang/temp/jsonl",
281
-
path_in_repo="test/abc",
282
-
user_name="wanghh2000",
283
-
token=token,
284
-
repo_type="model",
285
-
branch_name="v1",
286
-
)
287
-
288
-
r.upload()
289
-
```
290
-
291
-
### Model loading compatible with huggingface
292
-
293
-
The transformers library supports directly inputting the repo_id from Hugging Face to download and load related models, as shown below:
294
-
295
-
```python
296
-
from transformers import AutoModelForCausalLM
297
-
model = AutoModelForCausalLM.from_pretrained('model/repoid')
298
-
```
299
-
300
-
In this code, the Hugging Face Transformers library first downloads the model to a local cache folder, then reads the configuration, and loads the model by dynamically selecting the relevant class for instantiation.
301
-
302
-
To ensure compatibility with Hugging Face, version 0.2 of the CSGHub SDK now includes the most commonly features: downloading and loading models. Models can be downloaded and loaded as follows:
from pycsghub.repo_reader import AutoModelForCausalLM
308
-
model = AutoModelForCausalLM.from_pretrained('model/repoid')
309
-
```
310
-
311
-
This code:
312
-
313
-
1. Use the `snapshot_download` from the CSGHub SDK library to download the related files.
314
-
315
-
2. By generating batch classes dynamically and using class name reflection mechanism, a large number of classes with the same names as those automatically loaded by transformers are created in batches.
316
-
317
-
3. Assign it with the from_pretrained method, so the model read out will be an hf-transformers model.
92
+
For detailed SDK usage examples, including model/dataset downloading, file uploading, directory uploading, and Hugging Face compatible model loading, please refer to our [SDK documentation](doc/sdk.md).
0 commit comments