Reputation
Badges 1
25 × Eureka!DeliciousBluewhale87 out of curiosity , what do you mean by "deployment functionality" ? is it model serving ?
So net-net does this mean itโs behaving as expected,
It is as expected.
If no "Installed Packages" are listed, then it cannot pull a cached venv (because requirements.txt is not a full env, and it never analyzed it)).
It does however create a venv cache based on it (after installing it)
The Clone of this Task (i.e. right click on the UI clone experiment, enqueue it, Will use the cached copy becuase the full packages are listed in the "Installed Packages" section of the Task.
Make sens...
It's seems you are are getting 401 unauthorized , is this the same domain? I'm assuming the issue the logged in cookie is not sent?
You can see in the log it tries to download an artifact from a specific IP:URL is that link a valid one?
(this seems like the main cause of the error, first line in the screenshot)
Hi @<1533620191232004096:profile|NuttyLobster9>
First nice workaround!
Second could you send the full log? When the venv is skipped then pytorch resolving should be skipped as well, and no error should be raised...
And Lastly could you also send the log of the task that executed correctly (the one you cloned), because you are correct it should have been the same
ohh, the copy paste thing when you generate credentials ?
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
MelancholyElk85 that looks great, let me see how quickly we can push it (I think 1.1.5 needs to be pushed very soon, I'll check if we can have it before ๐ )
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
I think it's supposed to be out early Nov ๐
@<1523722618576834560:profile|ShaggyElk85> nice !
I think that in theory you can run the DBs arm64 images no?
thought the agent created a new conda env and installed all packages
It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point...
link to the line please ๐
I though the dataset was only linked to the fileserver and not to the specific url used to upload it.ย (
ShinyRabbit94 yep exactly! the idea is that you can actually do the storage on any solution (S3/GS etc.) the file server is just the default one ๐
@<1539780258050347008:profile|CheerfulKoala77> make sure the AMI id matches the zone of the EC2 machine
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
If this is the case, then we do not change the maptplotlib backend
Also
I've attempted converting theย
mpl
ย image toย
PIL
ย and useย
report_image
ย to push the image, to no avail.
What are you getting? error / exception ?
So just to be clear - the file server has nothing to do with the storage?
Think of it as a quick and dirty "minio", storing files and serving them over http. If you have minio (or any object storage) you can replace it all together ๐
iโm working on creating a custom config with istio
That is awesome! let me know if we could help ๐
Also please consider PRing it, I'm sure other users will appreciate the option
Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)
the model has been overwritten. I guess this is due to this instruction:
This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a...
Hi AbruptCow41
I just want them to be able to write in them without them appear nor in their clearml.conf nor in their environmental variables.
So where would they put them ? (or is it pre baked into the docker?)
Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on...
understood trains does not have auto versioning
What do you mean auto versioning ?
task name is not unique, task ID is unique, you can have multiple tasks with the same name and you can edit the name post execution
can you bump me to that thread?
https://clearml.slack.com/archives/CTK20V944/p1630610430171200
I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?