Internally we use blob.upload_from_file it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).
Glad to hear!
(yeah @<1603198134261911552:profile|ColossalReindeer77> I'm with you the override is not intuitive, I'll pass the info to the technical writers, hopefully they can find a way to make it easier to understand)
Thatโs the question i want to raise too,
No file size limit
Let me try to run it myself
to enable access to the s3 bucket. In this case I wonder how clearml sdk gets access to the s3 bucket if it relies on secret access key and access key id.
Right, basically someone needs to configure the "regular" environment variables for boto to use the IAM role, clearml will basically uses boto, so it should be transparent. does that make sense ? How do you spin the job on the k8s cluster and how do you configure it?
ince these are temp credentials awe need to use the sessi...
In order to clone the Task it needs to complete sync, which implies closing. I guess the use case for execute remotely while still running was not considered. How / why is this your workflow? Specifically how does Jupyter get into the picture?
GiganticTurtle0 you mean the repo for the function itself ?
the default assumes the function is "standalone", you can specify a repo with:@PipelineDecorator.component(..., repo='.')
will take the current folder's repo (i.e. the local one)
you can also specify repo url/commit etc (repo=' https://github/user/repo/repo.git ' ....)
See here:
https://github.com/allegroai/clearml/blob/dd3d4cec948c9f6583a0b69b05043fd60d8c103a/clearml/automation/controller.py#L1931
Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?
Hi @<1523702307240284160:profile|TeenyBeetle18>
and url of the model refers to local file, no to the remote storage.
Do you mean that in the Model tab when you look into the model details the URL points to a local location (e.g. file:///mnt/something/model) ?
And your goal is to get a copy of that model (file) from your code, is that correct ?
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I am running cross_validation, training a bunch of models in a loop like this:
Use the wildcard or disable all together:
task = Task.init(..., auto_connect_frameworks={"joblib": False})
You can also do
task = Task.init(..., auto_connect_frameworks={"joblib": ["realmodelonly.pkl", ]})
creating a dataset with parents worked very well and produced great visuals on the UI!
woot woot!
I tried the squash solution, however this somehow caused a download of all the datasets into my
so this actually works, kind or like git squash, bottom line it will repackage the data from all the different versions into one new version. This means downloading the data from all squashed versions, then repackaging it into a single new version. Make sense ?
GreasyPenguin14 thank you! that will make our life a lot easier ๐
ProudMosquito87 Just a few pointers on how we convert the TB histograms to awesome (but less accurate) 3D surfaces.
First I have to admit, I almost never use these histograms, maybe to detect a plateau of if something goes really wrong...
The 3D surface is basically grouping all the histograms and then bucketing them (I think the default is 50 buckets) so that you get a general feel of what's going on, not necessary a detailed view. Bottom line, you are correct, the TB is the source of truth...
- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?
DefeatedCrab47 no idea, but you are more then welcome to join the thread here, and point it out:
https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/249
Let say I donโt have the data on my local machine but only S3 bucket.
You can still register it, but make sure you do not delete it from the S3 bucket because it will keep a link to it
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')': /
what did you put in output_uri ?
This only talks about bugs reporting and enhancement suggestions
I'll make sure this is fixed ๐
I don't see any requests
This points to configuration, specifically maybe it is directed to a different server?!
Hi CharmingPuppy6
Basically yes there is.
The way clearml is designed, is to have queues abstract different types pf resources. for example a queue for single gpu jobs (let's nam "single_gpu") and a queue for dual gpu jobs (let's name it "single_gpu").
Then you spin agents on machines and have the agents pull jobs from specific queues based on the hardware they have. For example we can have a 4 GPU machine with 3 agents, one agent connect to 2xGPUs and pulling Tasks from the "dual_gpu...
Hi GiganticTurtle0
dataset_task = Task.get_task(task_id=dataset.id)
Hmmm I think that when it gets the Task "output_uri" is not updated from the predefined Task (you can obviously set it again).
This seems like a bug that is unrelated to Datasets.
Basically any Task that you retrieve will default to the default ouput_uri (not the stored one)
EnchantingWorm39 you have great timing ;)
DistressedGoat23 you are correct, since at the end this become a plotly object the extra_layout is for general purpose layout, but this specific entry is next to the data. Bottom line, can you open a github issue, so we do not forget to fix? In the mean time you can use the general plotly reporting as SweetBadger76 suggested
OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What woul...
This would work to load the local modules, but Iโm also using poetry and the
pyproject.toml
is in the subdirectory, so the agent wonโt install any dependency if I donโt set the
work_dir
hmmm true, in terms of requirements, you can list them in the decorator (see packages argument)
(as i see the services worker is only in the services-queue, and not my default queue (where my other servers/workers are)
So basically the service-mode is just a flag passed to the agent, and the services queue is the name of the queue it will pull from.
If i want a normal worker also
You can just add another section to the docker-compose, or run it manually after you spin the docker-compose.
LazyFox65 wdyt ?
Hi @<1573119955400921088:profile|CloudyPelican46>
On what machine is it best practice to run the clean up service, local machine or should it be on the clearml server ?
The easiest is to run it on the server machine itself, even though in practice you can put it anywhere, but most of the time this service is sleeping and not using so much RAM so it kind of makes sense
You put it there ๐ so the assumption you know what you are looking for, or use glob? wdyt?